Disentangling Style and Speaker Attributes for TTS Style Transfer

نویسندگان

چکیده

End-to-end neural TTS has shown improved performance in speech style transfer. However, the improvement is still limited by available training data both target styles and speakers. Additionally, degenerated observed when trained tries to transfer a from new speaker with an unknown, arbitrary style. In this paper, we propose approach seen unseen on disjoint, multi-style datasets, i. e., datasets of different are recorded, one individual multiple utterances. An inverse autoregressive flow (IAF) technique first introduced improve variational inference for learning expressive representation. A encoder network then developed discriminative embedding, which jointly rest modules. The proposed effectively six specifically-designed objectives: reconstruction loss, adversarial distortion cycle consistency classification loss. Experiments demonstrate, objectively subjectively, effectiveness tasks. our superior more robust than those four other reference systems prior art.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separating Style and Content for Generalized Style Transfer

Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, m...

متن کامل

Adding speaking style to a TTS system

This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...

متن کامل

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation

One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identifi...

متن کامل

Artistic Style Transfer for Videos

In the past, manually re-drawing an image in a certain artistic style required a professional artist and a long time. Doing this for a video sequence single-handed was beyond imagination. Nowadays computers provide new possibilities. We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. We make use of recent advances in style transfe...

متن کامل

Stereoscopic Neural Style Transfer

This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR. We start with a careful examination of applying existing monocular style transfer methods to left and right views of stereoscopic images separately. This reveals that the original disparity consistency cannot be well preserved in the final stylization result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2022.3145297